Skip to content

Conversation

@Manik2708
Copy link
Contributor

Which problem is this PR solving?

Description of the changes

  • Refactor the internal methods of integration tests to read, write and compare ptrace.Traces directly.

How was this change tested?

  • Integration Tests

Checklist

@Manik2708 Manik2708 requested a review from a team as a code owner January 1, 2026 10:19
@Manik2708 Manik2708 requested a review from albertteoh January 1, 2026 10:19
@Manik2708 Manik2708 marked this pull request as draft January 1, 2026 10:19
@dosubot dosubot bot added the area/storage label Jan 1, 2026
@Manik2708
Copy link
Contributor Author

Currently fixing the ES tests

Signed-off-by: Manik Mehta <[email protected]>
Signed-off-by: Manik Mehta <[email protected]>
Signed-off-by: Manik Mehta <[email protected]>
@codecov
Copy link

codecov bot commented Jan 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.50%. Comparing base (0954788) to head (482ee27).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7812      +/-   ##
==========================================
+ Coverage   95.49%   95.50%   +0.01%     
==========================================
  Files         305      305              
  Lines       16174    16174              
==========================================
+ Hits        15445    15447       +2     
+ Misses        571      570       -1     
+ Partials      158      157       -1     
Flag Coverage Δ
badger_v1 8.91% <ø> (-0.28%) ⬇️
badger_v2 1.27% <ø> (-0.67%) ⬇️
cassandra-4.x-v1-manual 13.32% <ø> (-0.28%) ⬇️
cassandra-4.x-v2-auto 1.27% <ø> (-0.66%) ⬇️
cassandra-4.x-v2-manual 1.27% <ø> (-0.66%) ⬇️
cassandra-5.x-v1-manual 13.32% <ø> (-0.28%) ⬇️
cassandra-5.x-v2-auto 1.27% <ø> (-0.66%) ⬇️
cassandra-5.x-v2-manual 1.27% <ø> (-0.66%) ⬇️
clickhouse 1.23% <ø> (-0.75%) ⬇️
elasticsearch-6.x-v1 17.00% <ø> (-0.55%) ⬇️
elasticsearch-7.x-v1 17.03% <ø> (-0.55%) ⬇️
elasticsearch-8.x-v1 17.18% <ø> (-0.55%) ⬇️
elasticsearch-8.x-v2 1.27% <ø> (-0.67%) ⬇️
elasticsearch-9.x-v2 1.27% <ø> (-0.67%) ⬇️
grpc_v1 8.32% <ø> (-0.54%) ⬇️
grpc_v2 1.27% <ø> (-0.67%) ⬇️
kafka-3.x-v2 1.27% <ø> (-0.67%) ⬇️
memory_v2 1.27% <ø> (-0.67%) ⬇️
opensearch-1.x-v1 17.07% <ø> (-0.55%) ⬇️
opensearch-2.x-v1 17.07% <ø> (-0.55%) ⬇️
opensearch-2.x-v2 1.27% <ø> (-0.67%) ⬇️
opensearch-3.x-v2 1.27% <ø> (-0.67%) ⬇️
query 1.27% <ø> (-0.67%) ⬇️
tailsampling-processor 0.55% <ø> (-0.01%) ⬇️
unittests 94.15% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link

github-actions bot commented Jan 2, 2026

Metrics Comparison Summary

Total changes across all snapshots: 0

Detailed changes per snapshot

summary_metrics_snapshot_cassandra

📊 Metrics Diff Summary

Total Changes: 0

  • 🆕 Added: 0 metrics
  • ❌ Removed: 0 metrics
  • 🔄 Modified: 0 metrics
  • 🚫 Excluded: 53 metrics

summary_metrics_snapshot_cassandra

📊 Metrics Diff Summary

Total Changes: 0

  • 🆕 Added: 0 metrics
  • ❌ Removed: 0 metrics
  • 🔄 Modified: 0 metrics
  • 🚫 Excluded: 53 metrics

➡️ View full metrics file

@Manik2708 Manik2708 marked this pull request as ready for review January 2, 2026 15:20
@Manik2708
Copy link
Contributor Author

Manik2708 commented Jan 2, 2026

Majority of tests are passing, except kafka. Working on it!. @yurishkuro Please can you review it!

Signed-off-by: Manik Mehta <[email protected]>
@Manik2708
Copy link
Contributor Author

Manik2708 commented Jan 2, 2026

Want to ask about kafka, do kafka also assigns one span per resource span? Like ES/OS? Also I can't understand the encoding part, how is kafka tested? i mean to ask how it is different from other storage tests?

return 0
}

func checkSize(t *testing.T, expected *model.Trace, actual *model.Trace) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to check size, ptracetest.CompareTraces checks for us

ptracetest.IgnoreSpansOrder(),
}
if err := ptracetest.CompareTraces(expected, actual, options...); err != nil {
t.Logf("Actual trace and expected traces are not equal: %v", err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need of pretty diff, CompareTraces gives first point of difference as error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you tried comparing JSON strigns instead? I think it's nice to get a full dump of diffs, not the first breaking point.

Example:

import (
    "github.com/hexops/gotextdiff"
    "github.com/hexops/gotextdiff/myers"
    "github.com/hexops/gotextdiff/span"
)

func DiffStrings(want, got string) string {
    edits := myers.ComputeEdits(span.URIFromPath("want.json"), want, got)
    diff := gotextdiff.ToUnified("want.json", "got.json", want, edits)
    return fmt.Sprint(diff)
}

return bytes.Compare(aAttrs[:], bAttrs[:])
}

func compareTimestamps(a, b pcommon.Timestamp) int {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could directly subtract the timestamps but that would require unnecessary and risky conversion of uint64 to int

t.Log(err)
return false
}
if len(expected) != len(traces) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know the number of traces which reader will give in one slice, so this check becomes useless

@Manik2708
Copy link
Contributor Author

Manik2708 commented Jan 2, 2026

Another problem is that for TestGetLargeTrace in some tests, the loaded and expected traces are same like: https://github.com/jaegertracing/jaeger/actions/runs/20660935489/job/59323122666?pr=7812#step:7:547 whereas in some tests, the difference is very high, like: https://github.com/jaegertracing/jaeger/actions/runs/20660935489/job/59323122693?pr=7812#step:7:784. I can't figure out the exact reason, how it is linked to conversion? Initially I thought it might be related to normalization but same issue is there in memory, where there is no normalization.

Signed-off-by: Manik Mehta <[email protected]>
@yurishkuro
Copy link
Member

Also I can't get how is this linked to this refactoring!

Are you generating more verbose traces which exceeded the message limit?

There should be settings in both Kafka broker and exporter about max message size. Alternatively we can use a smaller batch in the collector config, there's no reason to send all 1000 spans as a single message

@Manik2708
Copy link
Contributor Author

Manik2708 commented Jan 7, 2026

Also I can't get how is this linked to this refactoring!

Are you generating more verbose traces which exceeded the message limit?

There should be settings in both Kafka broker and exporter about max message size. Alternatively we can use a smaller batch in the collector config, there's no reason to send all 1000 spans as a single message

  1. The difference in sending traces is that traces are normalised to 1 resource span per span because kafka has 4 tests; otlp tests don't require nomralization but jaeger proto do require it. I tried turning off normalization, still otel_json test is failing.
  2. The default value of send_batch_size is 8192, should we reduce it? if yes what can be a good number? AI say for 1024 for send_batch_size and 2048 for send_batch_max_size, should I try with this number?

@yurishkuro
Copy link
Member

I remember there was an outstanding ticket in upstream OTEL contrib about Kafka exporter not respecting max batch size / message size, i.e. the batch processor can make the message bigger by aggregating multiple ptrace.Traces but could not make one huge ptrace.Traces payload smaller. There was a PR trying to solve that but I don't recall seeing it merged.

@Manik2708
Copy link
Contributor Author

I remember there was an outstanding ticket in upstream OTEL contrib about Kafka exporter not respecting max batch size / message size, i.e. the batch processor can make the message bigger by aggregating multiple ptrace.Traces but could not make one huge ptrace.Traces payload smaller. There was a PR trying to solve that but I don't recall seeing it merged.

yes, the issue is still there!

_ io.Closer = (*traceWriter)(nil)

MaxChunkSize = 35 // max chunk size otel kafka export can handle safely.
MaxChunkSize = 5 // max chunk size otel kafka export can handle safely.
Copy link
Contributor Author

@Manik2708 Manik2708 Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reducing to this has passed the tests

@Manik2708
Copy link
Contributor Author

@yurishkuro Please review

loadAndParseJSONPB(t, fileName, &trace)
return &trace
// getNormalisedTraces normalise traces and assign one resource span to one span
func getNormalisedTraces(td ptrace.Traces) ptrace.Traces {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use usually use American spelling

Suggested change
func getNormalisedTraces(td ptrace.Traces) ptrace.Traces {
func getNormalizedTraces(td ptrace.Traces) ptrace.Traces {

I am not clear why this function is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some storages like Elasticsearch, Clickhouse takes traces and save spans with scope and resource embedded in them so reader returns one resource span per span that means spans having same resource are not under same resource span so in integration tests, we need to normalize the fixtures to follow this. Some fixtures regroup the spans with same resource into same resource span which fails the tests

ptracetest.IgnoreSpansOrder(),
}
if err := ptracetest.CompareTraces(expected, actual, options...); err != nil {
t.Logf("Actual trace and expected traces are not equal: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you tried comparing JSON strigns instead? I think it's nice to get a full dump of diffs, not the first breaking point.

Example:

import (
    "github.com/hexops/gotextdiff"
    "github.com/hexops/gotextdiff/myers"
    "github.com/hexops/gotextdiff/span"
)

func DiffStrings(want, got string) string {
    edits := myers.ComputeEdits(span.URIFromPath("want.json"), want, got)
    diff := gotextdiff.ToUnified("want.json", "got.json", want, edits)
    return fmt.Sprint(diff)
}

@Manik2708
Copy link
Contributor Author

Manik2708 commented Jan 10, 2026

#7812 (comment) The problem in comparing strings are attributes. There is no way to sort attributes. I tried unmarshalling traces and then comparing bytes but attributes order became the problem.

@Manik2708 Manik2708 requested a review from yurishkuro January 10, 2026 19:05
@Manik2708
Copy link
Contributor Author

@yurishkuro can you please see this comment #7812 (comment)

@yurishkuro yurishkuro added the changelog:ci Change related to continuous integration / testing label Jan 12, 2026
// from the current UTC date, while preserving the original time
// This is required in integration tests because the fixtures have
// hardcoded start time and other time stamps and we need to make
// recent to fetch from the reader.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to make recent to fetch from the reader

Did similar enrichment exist with v1 model fixtures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, here it is:

func correctTime(jsonData []byte) []byte {

require.NoError(t, err)
}

func mergeTraces(traceIter [][]ptrace.Traces) ptrace.Traces {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use MergeTraces from ./internal/jptrace/aggregator.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that merge traces aims to solve a different problem. It copies all resource spans from source to destination and here we require to merge all the traces from slice

expected := expectedTracesPerTestCase[i]
actual := s.findTracesByQuery(t, queryTestCase.Query.ToTraceQueryParams(), expected)
CompareSliceOfTraces(t, expected, actual)
expectedTrace := mergeTraces([][]ptrace.Traces{expected})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mergeTraces performs unconditional merge, i.e. if you have more than one trace they will all get merged into a single ptrace.Traces object. Is that what you need here? Is it not possible that the query tests return more than one trace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes because its easier to compare traces then. Because the backends might not send traces in the exact expected way. For example it may be possible that we are expecting all traces to be sent in first slice of result but backends may send them in different slices. Although what I have understood is that in a single slice we expect traces of same trace id but different in resource or scope. If my assumption is correct still not all backends support scope. Even if we leave this, we have normalization for backends which makes comparisions of slices of traces even more complex. So I thought of merging them in a single trace and the simply comparing.

}
if spanCount(expected) != spanCount(traces) {
t.Logf("Excepting certain number of spans: expected: %d, actual: %d", spanCount(expected), spanCount(traces))
traces = mergeTraces(traceSlice)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar question - we might be merging multiple traces into one, is that what you want?

@Manik2708 Manik2708 requested a review from yurishkuro January 12, 2026 21:05
@SoumyaRaikwar
Copy link
Contributor

SoumyaRaikwar commented Jan 13, 2026

@Manik2708, @yurishkuro

I noticed checking the scope of issue #7050 that it also mentions: "Incrementally swap test fixtures to be stored as OTLP JSON instead of v1 model JSON".

I am interested in picking up that part of the task (converting the fixtures/traces/*.json files to OTLP). Since you are working on the integration tests, would you mind if I worked on the fixture conversion in parallel? Or did you plan to cover that as well?

Also, My PR #7761 which attempted some of the integration test cleanup is currently under review, but your changes here seem to cover the logic side more comprehensively with direct ptrace comparison. so should i close it?

@Manik2708
Copy link
Contributor Author

@Manik2708, @yurishkuro

I noticed checking the scope of issue #7050 that it also mentions: "Incrementally swap test fixtures to be stored as OTLP JSON instead of v1 model JSON".

I am interested in picking up that part of the task (converting the fixtures/traces/*.json files to OTLP). Since you are working on the integration tests, would you mind if I worked on the fixture conversion in parallel? Or did you plan to cover that as well?

Also, My PR #7761 which attempted some of the integration test cleanup is currently under review, but your changes here seem to cover the logic side more comprehensively with direct ptrace comparison. so should i close it?

You can start swapping fixtures after this PR is merged as before that it is not quite possible. Swapping the first fixture would require some more changes and after that you can work on it. I myself was thinking for a help from the community as the number of fixtures are high and it will take large time to refactor them all by a single contributor. I will update in the issue thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/storage changelog:ci Change related to continuous integration / testing storage/elasticsearch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants